1 Overview

How can quantitative methods allow historians to make sense of the ever-increasing wealth of digitalised sources, both numeric and textual? How can historians use quantitative and computational methods to gain a better overview of their source base, ask new questions, and supplement and enhance close reading?

This course provides an in-depth introduction to quantitative and computational methods, covering some of the techniques most widely used in research in the historical and social sciences. The hands-on course will teach students to apply basic quantitative methods to real historical information, both quantitative and qualitative, as well as equip them with the necessary background to understand and interpret the historical literature using these methods.

No background in statistics is required and statistical theory and mathematics will be kept to a minimum. The goal is to provide students with the tools to critically engage with the literature relying on quantitative methods and to be able to conduct original research using these tools in academia, the public sector, or business. In the process, students will learn basic programming skills in R, a statistical software widely used by practitioners in many different fields both inside and outside academia.

2 Goals

The course revolves around two main themes: - classic statistical methods: descriptive statistics, correlation and regression analysis. - computational content analysis: natural language processing (including word frequency, dictionary methods, text classification), topic models, digital corpus management and web scraping.

Taught intensively over five days, the course is structured into 10 three-hour sessions combining lectures (approx. 1 hour) and applied sessions behind a computer (approx. 2 hours). Students will learn by applying the different concepts to real data used by historians.

Apart from actively participating in the sessions, students are expected to deliver a take-home assignment after the course is finished. A candidate who satisfactorily passes the course will be able to: - Critically engage with studies relying on quantitative methods. - Conduct original research using these tools in academia, the public, or the business sector. - Continue to develop skills in quantitative methods based on the foundations provided in this course. - Acquire basic competence in R, a statistical software widely used by practitioners in many fields.

3 Pre-requisites

See the reading list PDF.

We will be using R, a widely popular statistical software, which has the added advantage of being free. Please download it in your laptops before coming to Trondheim. In addition, we strongly encourage to complete a short sequence of tutorials to smooth the learning curve, so you are not completely new to the software before the course starts. Instructions for installing R, RStudio, and R tutorials are here.

4 Practicalities

The following script will install (if not already installed) and load all packages used in this tutorial. (there’s bound to be a couple I’ve forgotten, we’ll add these as we go).

if (!requireNamespace("xfun")) install.packages("xfun")
xfun::pkg_attach2("tidyverse", "lubridate", "rvest", "stringr", "readtext", "tesseract", "tidytext", "SnowballC", "wordcloud", "wordcloud2", "widyr", "quanteda", "quanteda.textstats", "wordVectors", "magrittr", "pdftools", "devtools")
# The following two packages have Java dependencies that might give some (especially Windows) machines trouble. No worries if they don't load on your computer, I'll demonstrate running them on RStudio Cloud.
xfun::pkg:attach2("tabulizer", "openNLP")
install.packages("devtools")
devtools::install_github("bmschmidt/wordVectors")

References and Sources

Clark, Michael. 2018. “An Introduction to Text Processing and Analysis with R.” 2018. https://m-clark.github.io/text-analysis-with-R/.
Healy, Kieran. 2018. Data Visualization: A Practical Introduction. Princeton University Press. https://socviz.co/.
Jurafsky, Dan, and James H Martin. forthcoming. “Speech and Language Processing. Vol. 3.” Pearson London London. http://www.web.stanford.edu/~jurafsky/slp3/.
Mitchell, Ryan. 2018. Web Scraping with Python: Collecting More Data from the Modern Web. " O’Reilly Media, Inc.".
Niekler, Andreas, and Gregor Wiedemann. 2020. “Text Mining in R for the Social Sciences and Digital Humanities.” 2020. https://tm4ss.github.io/docs/index.html.
Schmidt, Ben. 2015. “Word Embeddings for the Digital Humanities.” 2015. http://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html.
Schweinberger, Martin. 2021. “POS-Tagging and Syntactic Parsing with R.” 2021. https://slcladal.github.io/tagging.html#2_POS-Tagging_with_openNLP.
Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. "O’Reilly Media, Inc.". https://www.tidytextmining.com/.
Tchalian, Hovig, and Laura Nelson. 2017. “Computational Text Analysis for Humanists and Social Scientists.” 2017. https://github.com/lknelson/text-analysis-2017.
Wickham, Hadley. 2016. “Elegant Graphics for Data Analysis.” https://ggplot2-book.org/.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.". https://r4ds.had.co.nz/index.html.


2022.